目前,深层神经网络(DNN)主要使用一阶方法进行训练。其中一些方法(例如Adam,Adagrad和Rmsprop及其变体)通过使用对角线矩阵来预先处理随机梯度。最近,通过通过按层块 - diagonal矩阵对随机梯度进行预处理,已开发出有效的二阶方法,例如KFAC,K-BFGS,洗发水和TNT。在这里,我们提出了一种自适应的“迷你块Fisher(MBF)”预处理方法,其中在这两类方法之间。具体而言,我们的方法对经验渔民矩阵使用块对基近似值,在DNN中的每一层(无论是卷积还是馈送)和完全连接,相关的对角线本身都是块 - diagonal,并且由A组成。大量适度的迷你块。我们的新方法利用GPU的并行性来有效地对每一层的大量矩阵进行计算。因此,MBF的均值计算成本仅略高于一阶方法。将我们提出的方法的性能与在自动编码器和CNN问题上的几种基线方法进行了比较,以在时间效率和概括功率方面验证其有效性。最后,证明MBF的理想化版本线性收敛。
translated by 谷歌翻译
尽管主要使用一阶方法来训练深层学习模型,但尤其是自然梯度方法,仍然是利益,因为它们通过使用曲率信息加速训练的可能性。已经提出了几种具有非对角线预处理矩阵,包括KFAC,洗发剂和K-BFG的方法,并显示有效。基于所谓的张量正常(TN)分布,我们提出并分析了一种全新的近似自然梯度方法,张量正常训练(TNT),如洗发水,只需要了解训练参数的形状。通过近似基于概率的Fisher矩阵,与经验丰富的Fisher矩阵相反,我们的方法使用基于采样的梯度的块明智的协方差作为预处理矩阵。此外,假设基于采样的(张量)梯度遵循TN分布,确保其协方差具有Kronecker可分离结构,这导致到Fisher矩阵的易逼近。因此,TNT的内存需求和迭代计算成本仅略高于一阶方法的计算成本。在我们的实验中,TNT对最先进的一阶方法以及最先进的二阶方法KFAC和洗发剂的可比优化性能表现出卓越的优化性能。此外,TNT证明了其概括的能力以及使用较少的时期的一级方法。
translated by 谷歌翻译
我们考虑在培训深度学习模型的通信约束下分布式优化。我们提出了一种新的算法,其参数更新依赖于两个力量:常规渐变步骤,以及当前最佳性能的工人(领导者)决定的纠正方向。我们的方法以多种方式与参数平均方案EASGD不同:(i)我们的客观制定与原始优化问题相比,我们的客观制定不会改变静止点的位置; (ii)我们避免通过将彼此不同局部最小值下降的本地工人拉动的融合减速(即其参数的平均值); (iii)我们的设计更新破坏了对称性的诅咒(被困在对称非凸景观中的透过透过透过次优溶液中的现象); (iv)我们的方法更加沟通高效,因为它仅广播领导者而不是所有工人的参数。我们提供了对所提出的算法的批量版本的理论分析,我们称之为领导者梯度下降(LGD)及其随机变体(LSGD)。最后,我们实现了算法的异步版本,并将其扩展到多领导者设置,我们组成的工人组,每个人都由自己的本地领导者(组中最佳表现者)表示,并使用纠正措施更新每个工作人员方向由两个有吸引力的力量组成:一个到当地,一个到全球领导者(所有工人中最好的表演者)。多引导设置与当前的硬件架构良好对齐,其中形成组的本地工人位于单个计算节点内,不同的组对应于不同的节点。对于培训卷积神经网络,我们经验证明了我们的方法对最先进的基线比较。
translated by 谷歌翻译
The pandemic of these very recent years has led to a dramatic increase in people wearing protective masks in public venues. This poses obvious challenges to the pervasive use of face recognition technology that now is suffering a decline in performance. One way to address the problem is to revert to face recovery methods as a preprocessing step. Current approaches to face reconstruction and manipulation leverage the ability to model the face manifold, but tend to be generic. We introduce a method that is specific for the recovery of the face image from an image of the same individual wearing a mask. We do so by designing a specialized GAN inversion method, based on an appropriate set of losses for learning an unmasking encoder. With extensive experiments, we show that the approach is effective at unmasking face images. In addition, we also show that the identity information is preserved sufficiently well to improve face verification performance based on several face recognition benchmark datasets.
translated by 谷歌翻译
Differentiable Search Indices (DSIs) encode a corpus of documents in the parameters of a model and use the same model to map queries directly to relevant document identifiers. Despite the strong performance of DSI models, deploying them in situations where the corpus changes over time is computationally expensive because reindexing the corpus requires re-training the model. In this work, we introduce DSI++, a continual learning challenge for DSI to incrementally index new documents while being able to answer queries related to both previously and newly indexed documents. Across different model scales and document identifier representations, we show that continual indexing of new documents leads to considerable forgetting of previously indexed documents. We also hypothesize and verify that the model experiences forgetting events during training, leading to unstable learning. To mitigate these issues, we investigate two approaches. The first focuses on modifying the training dynamics. Flatter minima implicitly alleviate forgetting, so we optimize for flatter loss basins and show that the model stably memorizes more documents (+12\%). Next, we introduce a generative memory to sample pseudo-queries for documents and supplement them during continual indexing to prevent forgetting for the retrieval task. Extensive experiments on novel continual indexing benchmarks based on Natural Questions (NQ) and MS MARCO demonstrate that our proposed solution mitigates forgetting by a significant margin. Concretely, it improves the average Hits@10 by $+21.1\%$ over competitive baselines for NQ and requires $6$ times fewer model updates compared to re-training the DSI model for incrementally indexing five corpora in a sequence.
translated by 谷歌翻译
Large language models (LLMs) have shown impressive results across a variety of tasks while requiring little or no direct supervision. Further, there is mounting evidence that LLMs may have potential in information-seeking scenarios. We believe the ability of an LLM to attribute the text that it generates is likely to be crucial for both system developers and users in this setting. We propose and study Attributed QA as a key first step in the development of attributed LLMs. We develop a reproducable evaluation framework for the task, using human annotations as a gold standard and a correlated automatic metric that we show is suitable for development settings. We describe and benchmark a broad set of architectures for the task. Our contributions give some concrete answers to two key questions (How to measure attribution?, and How well do current state-of-the-art methods perform on attribution?), and give some hints as to how to address a third key question (How to build LLMs with attribution?).
translated by 谷歌翻译
A clustering termination procedure which is locally adaptive (with respect to the hierarchical tree of sets representative of the agglomerative merging) is proposed, for agglomerative hierarchical clustering on a set equipped with a distance function. It represents a multi-scale alternative to conventional scale dependent threshold based termination criteria.
translated by 谷歌翻译
实例级图像检索(IIR)或简单的实例检索,涉及在数据集中查找包含查询实例(例如对象)的数据集中所有图像的问题。本文首次尝试使用基于实例歧视的对比学习(CL)解决此问题。尽管CL在许多计算机视觉任务中表现出令人印象深刻的性能,但在IIR领域也从未找到过类似的成功。在这项工作中,我们通过探索从预先训练和微调的CL模型中得出判别表示的能力来解决此问题。首先,我们通过比较预先训练的深度神经网络(DNN)分类器与CL模型学到的功能相比,研究了IIR转移学习的功效。这些发现启发了我们提出了一种新的培训策略,该策略通过使用平均精度(AP)损失以及微调方法来学习针对IIR量身定制的对比功能表示形式,从而优化CL以学习为导向IIR的功能。我们的经验评估表明,从挑战性的牛津和巴黎数据集中的预先培训的DNN分类器中学到的现成的特征上的表现显着提高。
translated by 谷歌翻译
夜间使用常规视觉摄像机运行的机器人由于噪声受限图像而在重建中面临重大挑战。先前的工作表明,爆发成像技术可用于部分克服这一问题。在本文中,我们开发了一种新型的功能检测器,该功能检测器直接在图像爆发上运行,从而在极低的光线条件下增强了基于视觉的重建。我们的方法通过在多尺度和多运动空间中共同搜索,在每次爆发中找到了定义明确的尺度和明显运动的关键点。因为我们在图像具有较高信噪比的阶段描述了这些功能,因此检测到的特征比常规嘈杂图像和突发的图像和表现出高度精确的最新特征更准确和匹配性能。我们显示了提高功能性能和摄像头姿势估计值,并在挑战光限制的场景中使用功能检测器展示了改进的结构,从而改善了结构。我们的功能Finder为在弱光方案和应用程序(包括夜间操作)中运行的机器人提供了重要的一步。
translated by 谷歌翻译
我们介绍了StreamNet,这是一种自动编码器体系结构,用于分析大量白质流线的高度异质几何形状。该提出的框架利用了Wasserstein-1度量的几何形状赋值特性,以实现整个流线束的直接编码和重建。我们表明,该模型不仅可以准确捕获人群中流线的分布结构,而且还能够在真实和合成流线之间实现出色的重建性能。使用最新的ART捆绑包比较度量标准,对40个健康对照的T1加权扩散成像产生的白质流线评估了实验模型性能。
translated by 谷歌翻译